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ADM-CLE APPROACH FOR DETECTING SLOW VARIABLES IN CONTINUOUS TIME 

MARKOV CHAINS AND DYNAMIC DATA* 

MIHAI CUCURINGUtt AND RADEK ERBANS 


Abstract. A method for detecting intrinsic slow variables in high-dimensional stochastic chemical reaction networks is developed 
and analyzed. It combines anisotropic diffusion maps (ADM) with approximations based on the chemical Langevin equation (CLE). 
The resulting approach, called ADM-CLE, has the potential of being more efficient than the ADM method for a large class of chemical 
reaction systems, because it replaces the computationally most expensive step of ADM (running local short bursts of simulations) 
by using an approximation based on the CLE. The ADM-CLE approach can be used to estimate the stationary distribution of the 
detected slow variable, without any a-priori knowledge of it. If the conditional distribution of the fast variables can be obtained 
analytically, then the resulting ADM-CLE approach does not make any use of Monte Carlo simulations to estimate the distributions 
of both slow and fast variables. 
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1. Introduction. The time evolution of a complex chemical reaction network often occurs at different time 
scales, and the observer is interested in tracking the evolution of the slowly evolving quantities (i.e., of the so 
called slow variables) as opposed to recording each and every single reaction that takes place in the system. 
Whenever a separation of scales exists, one has to simulate a large number of reactions in the system in order to 
capture the evolution of the slowly evolving variables. With this observation in mind, it becomes crucial to be 
able to detect and parametrize the underlying slow manifold corresponding to the slow variables intrinsic to the 
system. In the present paper, we introduce an unsupervised method of discovering the underlying hidden slow 
variables in chemical reaction networks, and of their stationary distributions, using the anisotropic diffusion map 
(ADM) framework [39]. 

The ADM is a special class of diffusion maps which have gained tremendous popularity in machine learning 
and statistical analysis, as a robust nonlinear dimensionality reduction technique, in recent years [36, 2, 10, 6]. 
Diffusion maps have been successfully used as a manifold learning tool, where it is assumed that the high 
dimensional data lies on a lower dimensional manifold, and one tries to capture the underlying geometric structure 
of the data, a setup where the traditional linear dimensionality reduction techniques (such as the Principal 
Component Analysis) have been shown to fail. In the diffusion maps setup, one constructs or is given a sparse 
weighted connected graph (usually in the form of a weighted k-Nearest-Neighbor graph, with each node connected 
only to its k nearest or most similar neighbors), and uses it to build the associated combinatorial Laplacian 
L = D — W, where W denotes the matrix of weights and D denotes a diagonal matrix with Du equal to the sum 
of all weights of the node i. Next, one considers the generalized eigenvalue problem Lx = XDx, whose solutions 
are related to the solutions of the eigenvalue problem Lx = Xx, where L — D~^W is a row-stochastic matrix 
often dubbed as the random walk normalized Laplacian. Whenever the pair (A, x) is an eigenvalue-eigenvector 
solution to Lx = Xx, then so is (I — A, x) for Lx = XDx. The (non-symmetric) matrix L can also be interpreted 
as a transition probability matrix of a Markov chain with state space given by the nodes of the graph, and entries 
Lij denoting the one-step transition probability from node i to j. 

In the diffusion map framework, one exploits a property of the top nontrivial eigenvector of the graph 
Laplacian of being piecewise constant on subsets of nodes in the domain that correspond to the same state 
associated to the underlying slow variable. We make this statement precise in Section 4, and further use the 
resulting classification in Section 5 to propose an unsupervised method for computing the stationary distribution 
of the hidden slow variable, without using any prior information on its structure. Since the top eigenvectors 
of the above Laplacian define the coarsest modes of variation in the data, and have a natural interpretation in 
terms of diffusion and random walks, they have been used in a very wide range of applications, including but 
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not limited to partitioning [41, 40], clustering and community detection [34, 45, 33], image segmentation [38], 
ranking [47, 17], and data visualization and learning from data [7, 36]. 

The main application area studied in this paper are stochastic models of chemical reaction networks. They 
are written in terms of stochastic simulation algorithms (SSAs) [21, 22] which have been used to model a number 
of biological systems, including the phage A lysis-lysogeny decision circuit [1], circadian rhythms [44], and the 
cell cycle [27]. The Gillespie SSA [21] is an exact stochastic method that simulates every chemical reaction, 
sampling from the solution of the corresponding chemical master equation (CME). To characterize the behavior 
of a chemical system, one needs to simulate a large number of reactions and realizations, which leads to very 
computationally intensive algorithms. For suitable classes of chemically reacting systems, one can sometimes 
use exact algorithms which are equivalent to the Gillespie SSA, but are less computationally intensive, such as 
the Gibson-Bruck SSA [19] and the Optimized Direct Method [5]. However, these methods also stochastically 
simulate the occurrence of every chemical reaction, which can be a computationally challenging task for systems 
with a very large number of species. One way to tackle this problem is to use parallel stochastic simulations [28]. 
In this work, we discuss an alternative approach which does not make use of parallel stochastic simulations, but 
at the same time, the proposed approach can also benefit from large processing power and parallel computing, 
as many steps of our proposed algorithms are highly parallelizable. 

An alternative approach to treating the molecular populations as discrete random variables, is to describe 
them in terms of their continuously changing concentration, which can be done via the Ghemical Langevin 
equation (CLE), a stochastic differential equation that links the discrete stochastic simulation algorithm with the 
deterministic reaction rate equations [20]. Although such an approach can be less computationally expensive, 
it comes with the disadvantage that, for certain chemical systems, it can lead to negative populations [46]. In 
addition, note that none of the above approaches takes explicit advantage of the separation of scales if one exists, 
a which we will make use of in this paper as detailed in Sections 4 and 5. 

It is often the case that a modeller is not interested in every single reaction which takes place in the system, 
but only in the slowly evolving quantities. Certain systems possess multiple time scales, meaning that one has to 
simulate a large number of reactions to reveal the slow dynamics. Several algorithms for chemical networks with 
fast and slow variables have already been developed in the literature. The authors of [25] proposed to simulate 
the fast reactions using Langevin dynamics, and the slow reactions using the Gillespie algorithm. This approach 
requires both the time scale separation and a sufficiently large system volume; however the latter constraint can be 
avoided using probability densities of the fast species conditioned on the slow species, and estimating the effective 
propensity functions of the slow species [3, 4, 13, 37, 43]. An alternative approach to simulating the evolution of 
the slow variables while avoiding doing so for the fast variables, is to estimate the probability distribution of the 
slow variables [18]. The key point in this approach is to use short bursts of appropriately initialized stochastic 
simulations to estimate the drift and diffusion coefficients for an approximating Fokker-Planck equation written 
in terms of the slow variables [15]. The success of this approach has already been demonstrated in a range of 
applications including materials science [24], cell motility [14], and social behavior of insects [42]. 

Ref. [8] introduces the conditional stochastic simulation algorithm (GSSA) that allows one to sample effi¬ 
ciently from the distribution of the fast variables conditioned on the slow ones [8] , and to estimate the coefficients 
of the effective stochastic differential equation (SDE) on the fly via a proposed constrained multiscale algorithm 
(GMA) algorithm. The GMA can be further modified by estimating the drift and diffusion coefficients in the 
form given by the GLE for the slow subsystem, which requires the estimation of effective propensity functions of 
slow reactions [9] . The main question we plan to address in our present work builds on and combines two already 
existing ideas investigated in [8] and [39], and brings several computational and algorithmic improvements. The 
above-mentioned CSSA algorithm explicitly makes use of the knowledge of the slow variables (often unavailable 
in many real applications), a drawback we plan to address as explained later in Section 4, where, driven by the 
top eigenvector of an appropriately constructed Laplacian, we discover the underlying slow variable. In doing 
so, we make use of the ADM framework [39] which modifies the traditional diffusion map approach to take into 
account the time-dependence of the data, i.e., the time stamp of each of the data points under consideration. By 
integrating local similarities at different scales, the ADM gives a global description of the data set. 

The rest of this paper is organized as follows. In Section 2 we provide a mathematical framework for 
multiscale modeling of stochastic chemical reactions networks and detail the two chemical systems via which 
we use to illustrate our approach. In Section 3 we introduce the ADM-GLE framework and we highlight its 
differences from the approaches which were previously introduced in the literature. In Section 4 we propose a 
robust mapping from the observable space to the “dynamically meaningful” inaccessible space, that allows us to 
recover the hidden slow variables. In Section 5 we introduce a Markov-based approach for approximating the 
steady distribution of the slow variable, and compare our results with another recently proposed approach. We 
conclude with a summary and discussion of future work in Section 6. 
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2. Problem formulation. A multi-scale modeling framework for stochastic chemical reaction networks 
can be formulated as follows. We consider a well-mixed system of I chemical species, denoted by Ai, A 2 ,..., 
that interact through m reaction channels i?i, i? 2 , • • ■, Rm in a reactor of volume V. We denote the state of the 
system by X(t) = [Ai(t), A 2 ,..., A^(t)], where Xi{t), i = 1,2,. represents the number of molecules of type 
Xi in the system at time t. With a slight abuse of notation, we interchangeably use A^ to denote the type i of 
the molecule. In certain scenarios, one may assume that the reactions can be classified as either fast or slow, 
depending on the time scale of occurrence [3]. As expected, the fast reactions occur many times on a timescale 
for which the slow reactions occur with very small probability. As defined in [3], the fast species denoted by F 
are those species whose population gets changed by a fast reaction. Slow species (denoted by S) are not changed 
by fast reactions. Considering that slow species are not only species from the set {Ai, A 2 ,..., Xi}, but also their 
functions which are not changed by fast reactions, the components of the fast and slow species can be used as a 
basis for the state space of the system, whose dimension equals the number of linearly independent species. 

For each reaction channel Rj, j = 1,2,... ,m, there exists a corresponding propensity function aj = aj{x), 
such that aj dt denotes the probability that, given X{t) = x, reaction Rj occurs within the infinitesimal time 
interval [t,t dt). We denote by v the stochiometrix matrix of size m x i, with entry Vji denoting the change 
in the number of molecules of type Xi caused by one occurrence of reaction channel Rj. The continuous time 
discrete in space Markov chain can be further approximated by the CLE for a multivariate continuous Markov 
process [20]. Using time step At, the Euler-Maruyama discretization of the CLE is given by 

771 771 

A,(t -b At) = A,(t) -b aj{X{t)) + ^ Vji^ aj{X{t)) Nj{t) for alH = 1, 2,..., £, (2.1) 

i=i 

where Aj, with another slight abuse of notation, denotes a real-valued approximation of the number of molecules 
of the z-th chemical species, i = 1,2,... Here, Nj(t), j = 1,2,... ,m, denote the set of m independent normally 
distributed random variables with zero mean and unit variance. 

2.1. Illustrative example CS-I. As the first illustrative example, we consider the following simple 2- 
dimensional chemical system, with the two chemical species denoted by Ai and A 2 (i.e., i = 2) which are subject 
to four reaction channels Rj, j = 1,2,3,4 (i.e., m = 4), given by 


0 


Ai 




( 2 . 2 ) 


Throughout the rest of this paper, we shall refer to the chemical system (2.2) as CS-1 (i.e. “chemical system 1”). 
We label by Rj the reaction corresponding to the reaction rate subscript kj, j = 1,2, 3,4, and note that each 
reaction Rj has associated a propensity function aj(t) given by [21] 


ai{t) = kiV, 02 ( 1 ) = ^2 Ai(t), a 3 (t) = fcs A 2 (t), a 4 {t) = k 4 X 2 {t), (2.3) 


where V denotes the volume of the reactor. We consider the system with the following dimensionless parameters 


kiV = 100, ^2 = ^3 = 200 and k 4 = 1. 


(2.4) 


We plot in Figure 2.1(a) the time evolution of the two different species in system (2.2), together with the slow 
variable S = (Ai -b A2)/2, starting from initial conditions Ai(0) = A2(0) = 100. As the figure shows, the system 
variables Ai and A 2 are changing very frequently (thus we label them as fast variables), while the newly defined 
variable S changes very infrequently and can be considered to be a slow variable. 

Following [26], for the chemical system in (2.2) comprised only of monomolecular reactions, it is possible to 
compute analytically the stationary distribution of the slow variable S, since the joint probability distribution of 
the two variables Ai and A 2 is a multivariate Poisson distribution 


yii yn2 

P(Ai = m, A 2 = 712 ) = exp(-Ai - A 2 ) 

71 }! no! 


with parameters given by 


Ai 


kiV{kz + k4) 

k2k4 


100.5 and A 2 = ^ = 100. 

A:4 


(2.5) 


( 2 . 6 ) 
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Fig. 2.1. (a) Trajectories of the CS-I considered in (2.2) showing the behavior of the slow variable S = X 2 )in contrast 

to the behavior of the fast variables Xi and X 2 , where the system propensity functions and parameters are given by (2.3) and (2.4). 
(b) Trajectories of the CS-II considered in (2.7), showing the slow behavior of the variable S = Xi + 2 X 2 in contrast to the fast 
behavior of variables Xi and X 2 , where the system parameters are given by (2.8). 


2 . 2 . Illustrative example CS-II. The second example is taken from Ref. [8]. We shall refer to it as CS-II 
from now on. We consider the following system 

X2^Xi+X2, %^Xi, Xi^Xi^X2, ( 2 . 7 ) 

^2 ^4 ^6 

involving two molecular species Xi and X 2 , whose reactions Ri, R 2 , ■. ■, Re have the propensity functions given 

by 


ai{t) = kiX 2 {t), a 2 {t) = k 2 Xi{t)X 2 {t)/V, a 3 {t) = ksV, 

ai{t) = k.Xiit), aeit)=ke ^ ag(t) = ^ 6 ^ 2 (t), 

where V denotes the system volume. Figure 2.1(b) shows a simulated trajectory of this chemical system using 
the Gillespie algorithm for the following dimensionless parameters [8] 

fci = 32, k 2 = 0.041/, kaV = 1475, k^ = 19.75, k^ = lOR, fcg = 4000, (2.8) 

where we use 1/ = 8. Note that in this second example, reactions Re and Rq are occurring on a much faster 
timescale than the other four reactions Ri, R 2 , R 3 and R 4 . A natural choice for the slow variable is S' = X 1 -I- 2 A 2 , 
which is invariant with respect to all fast reactions [8], as we illustrate in Figure 2.1(b). 

2.3. Main problem. Our end goal in the present paper is to propose an algorithm that efficiently and 
accurately estimates the stationary probability density of the hidden slow variable S, without any prior knowledge 
of it. The approach we propose builds on the anisotropic diffusion map framework (ADM) to implicitly discover 
the mapping from the observable state space to the dynamically meaningful coordinates of the fast and slow 
variables, as previously introduced in [39], and on the CLE approximation (2.1). 

3. ADM-CLE approach. Let us consider example CS-11, and assume that s = s{xi,X 2 ) = xi + 2 x 2 
and / = f{xi,X 2 ) = Xi are the slowly and rapidly changing variables, respectively. They together define 
a mapping g : (xi,X 2 ) >—t (s,/) from the observable state variables xi and X 2 in the accessible space O to 
the “dynamically meaningful” (but in more complicated examples inaccessible) slow variable s and the fast 
accessible variable /, both in space H. In other words, g maps {xi,X 2 ) >—t {xi + 2 x 2 , xi), and conversely its 
inverse h := g-^ : (s, /) H. (/, ^). 

The approach introduced in [39] exploits the local point clouds generated by many local bursts of simulations 
at each point (a;i,a; 2 ) in the observable space O. Such observable local point clouds are the image under h of 
similar local point clouds in the inaccessible space R (at corresponding coordinates (s, /) such that h{s, f) = 
(cci, 2 ^ 2 )), which, due to the separation of scales between the fast and slow variables / and s, have the appearance 
of thin elongated ellipses. It is precisely this separation of scales that we leverage into building a sparse anisotropic 
graph Laplacian L in the observable space, and use it as an approximation of the isotropic graph Laplacian in 
the inaccessible space R. As we shall see, the top nontrivial eigenvector of L will robustly indicate all pairs of 
original states (xi, X 2 ) that correspond to the same slow variable S = s (where s = xi -I- 2 x 2 for CS-II). In other 
words, we discover on the fly the structure of the slow variable S, and further integrate this information into a 
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Markov-based method for estimating its stationary distribution P(S' = s), while also computing along the way an 
analytical expression for the conditional distribution of the fast variable given the slow variable P(F = f\S = s). 

Singer et al. [39] run many local bursts of simulations for a short time step 6 t starting at (xi,X 2 )- Such 
trajectories end up at random locations forming a cloud of points in the observable plane O, with a bivariate 
normal distribution with 2x2 covariance matrix S. The shape of the resulting point cloud is an ellipse, whose 
axes reflect the dynamics of the data points. In other words, when there is a separation of scales, the ellipses are 
thin and elongated, with the ratio between the axis of the ellipse given by the ratio 



^2 


(3.1) 


of the two eigenvalues of S. The first eigenvector corresponding to Ai points in the direction of the fast dynamics 
on the line xi 2x2 = s, while the second one points in the direction of the slow dynamics. In particular, 
T is a small parameter, i.e. 0 < r ^ 1. In general, we wish to piece together locally defined components 
into a globally consistent framework, a nontrivial task when the underlying unobservable slow variables (or the 
propensity functions of the system) are complicated nonlinear functions of the observable variables in O. 

The construction of the ADM framework in [39] relates the anisotropic graph Laplacian in the observable 
space O with the isotropic graph Laplacian in the inaccessible space H. In that setup, each of the N data 
points i = 1,2, lives in an ^-dimensional data space. For both CS-I and CS-II, the data is two- 

dimensional, thus £ = 2. For the former system, we consider each lattice point in the domain [50,150] x [50,150], 
hence there are N = 101^ = 10,201 states, while for the latter one we consider the domain [1,110] x [1,110], 
i.e. N = 110^ = 12,100. Throughout the paper, we will often refer to the N data points = {xi,X 2 )^^\ 
i = 1,2,..., N, as O-states of the chemical system. The ADM [39] then generates ensembles of short simulation 
bursts at each of the N points in the data set, computes the averaged position after statistically averaging over 
the many simulated trajectories, and obtain an estimate of the local 2x2 covariance matrix For each data 
point x(*\ the inverse of is computed and symmetric S-dependent squared distance between pairs of data 
points in the two-dimensional observable space (given by (3.3) below) is defined. The ADM framework then 
uses this dynamic distance measure to approximate the Laplacian on the underlying hidden slow manifold. We 
provide further details on the anisotropic diffusion maps framework in Section 3.2. We now highlight the first 
difference between the approach taken in this paper and in [39]. 

3.1. Replacing short simulation bursts by the CLE approximation. The local bursts of simulations 
initiated at each data point in order to estimate the local covariances may be computationally expensive to 
estimate. In this paper, we bypass these short bursts of simulations by using an approximation given by the CLE 
(2.1), which allows for a theoretical derivation of the local 2x2 covariance matrices. Using (2.1), we obtain 


Cov(A,(t -I- At),Xk{t + At)) = E[Xi{t + At)Xk{t + At)] - E[Xi{t + At)]E[Xk{t + At)] 


= At'^VjiVjkajiX). 

i=i 


(3.2) 


Computing the eigen-decomposition of a local covariance matrix is analogous to performing the Principal Com¬ 
ponent Analysis on the local cloud of points, generated by the short simulations bursts. The advantage of (3.2) 
over the computational approach used in [39] is that can be computed at each data point without running 
(computationally intensive) short bursts of simulations. The error of the CLE approximation depends on the 
values of coordinates of the data point x(*\ i.e. on the system volume V [23, 12]. In the case of CS-I or CS-II, the 
most probable states contain about one hundred molecules of each chemical species and the CLE approximation 
(3.2) is well justified. 

3.2. Anisotropic diffusion kernels. The next task is the integration of all local principal components 
into a global framework, with the purpose of identifying the hidden slow variable. We estimate the distance 
(and hence the similarity measure) between the slow variables in the underlying inaccessible manifold using 
the anisotropic graph Laplacian [39]. We derive a symmetrized second order approximation of the (unknown) 
distances in the inaccessible space "H, based on the Jacobian of the unknown mapping from the inaccessible to 
the observable space. The E-dependent distance between two O-states is given by 


(^(xi,X 2)'^"K (xi,X2)'^-^^'j = ^ (^(xi,X2)'^"^ - {xi,X2)^^'>'^ 


.-1 


— 1 


Xi,X2)‘''''’ - {X1,X2 


5 


(3.3) 


and represents a second order approximation of the Euclidean distance in the inaccessible (s, r/)-space 




(gW _ 50 -)) 2 ^ 


(3.4) 


where the last approximation is due to the fact that t is a small parameter, see (3.1). Note that it is also 
possible to extend (3.4) to higher dimensions, as long as there exists a separation of scales between the set of slow 
variables and the set of fast variables [39]. Using approximation (3.3)-(3.4) of the distance between states of the 
slow variable, we next construct (an approximation of) the Laplacian on the underlying hidden slow manifold, 
using the Gaussian kernel as a similarity measure between the slow variable states. We build an N x N similarity 
matrix W with entries 


Wij = exp 


-d|[(xi,X2)^"\ (X1,X2)^-^^] 

f^2 


exp 


(s« sa))2^ 


i,j = l,2,...,N, (3.5) 


where the single smoothing parameter e (the kernel scale) has a two-fold interpretation. On one hand, e denotes 
the squared radius of the neighborhood used to infer local geometric information, in particular Wij is 0(1) when 
and are in a ball of radius -y/i, thus close on the underlying slow manifold, but it is exponentially small 
for states that are more than apart. On the other hand, e represents the discrete time step at which the 
underlying random walker jumps from one point to another. We refer the reader to [31] for a detailed survey 
of random walks on graphs, and their applications. We normalize W using the diagonal matrix D to define the 
row-stochastic matrix L by 


N 

= L = D-^W. (3.6) 

1=1 

Since L is a row-stochastic matrix, it has eigenvalue Aq = 1 with trivial eigenvector $o = (1) ■ j 1)^- The 

remaining eigenvalues can be ordered as 

1 = Aq > Ai > A 2 > ... > A^V-l. 

We denote by the corresponding eigenvectors, i.e. L^i = Xi^i. The top d nontrivial eigenvectors of the 
random-walk anisotropic Laplacian L describe the geometry of the underlying d-dimensional manifold [16], i.e. 
the i-th data point is represented by the following diffusion map 

($i(z),$2(l),...,$rf(0), i = l,2,...,iV, (3.7) 

where ^j{i) denotes the z-th component of the eigenvector However, note that some of the considered 
eigenvectors can be higher harmonics of the same principal direction along the manifold, thus in practice one 
computes the correlation between the computed eigenvectors before selecting the above d eigenvectors chosen to 
parametrize the underlying manifold. For the two chemical systems considered in this paper, we show in the 
remainder of this section how the top (i.e., d = 1) non-trivial eigenvector of L can be used to successfully recover 
the underlying slow variable. 

Using the stochasticity of L, we can interpret it as a random walk matrix on the weighted graph G = (U, E), 
where the set of nodes corresponds to the original observable states (xi, X 2 )^*\ z = 1, 2,..., TV (and implicitly to 
states of the slow variable), and there is an edge between nodes z and j if and only if Wij > 0. The associated 
combinatorial Laplacian is given hy L = D — W. Whenever the pair (A^, $i) is an eigenvalue-eigenvector solution 
to L^i = Xi^i, then so is (1 — Xi, ^i) for the generalized eigenvalue problem L^i = XiD^i. We plot in Figures 
3.1(a) and 3.2(a) the spectrum of the combinatorial Laplacian L = D — W, for the chemical systems CS-I and 
CS-II. In Figures 3.1(b) and 3.2(b) we color the states of the network with the top non-trivial eigenvector $ 1 . 

Before considering the top eigenvector of L for determining the underlying slow variable and estimating its 
stationary distribution, we propose to use a sparse graph Laplacian which differs from the ADM method in [39] , 
where the Laplacian matrix is associated to a complete weighted graph. However, using a complete graph leads 
to computing the E-dependent squared distance in equation (3.3) for any pair of nodes, thus an 0{N‘^) number 
of computations is used. In light of the approximation (3.4), a pair of points which are far away in the observable 
space (i.e., for which (i|.((xi, X 2 )^*\ (xi, X 2 )^^^) is large) denotes a pair of corresponding states of the slow variable 
which are also far away in the inaccessible space. Thus we do not have to do such computations, because points 
far away in the unobservable space will have an exponentially small similarity Wij close to 0. The fact that the 
shape of the local point cloud is an ellipse provides some insight in this direction. Thus we will build a sparse 
graph of pairwise measurements and no longer compute the E-dependent distance between all points of the data 
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(b) 


(c) 


Weighted degree 


(d) 



Fig. 3.1. Illustrative Example CS-I. (a) The top 500 eigenvalues of the associated combinatorial Laplacian, i.e. (1 — A^) for 
2 = 1,2,..., 500. (b) The coloring of the nodes of G (states of the observable space) according to their corresponding entry in the 
top eigenvector of L given by (3.6). (c) The weighted degree distribution of the ground state graph G. (d) A scatterplot of the 
states of the system, colored by their weighted degree. 


(a) 



(b) (c) (d) 
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Fig. 3.2. Illustrative Example CS-II. (a) The top 500 eigenvalues of the associated combinatorial Laplacian, i.e. (1 — A^) for 
2 = 1,2,..., 500. (b) The coloring of the nodes of G (states of the observable space) according to their corresponding entry in the 
top eigenvector of L given by (3.6). (c) The weighted degree distribution of the ground state graph G. (d) A scatterplot of the 
states of the system, colored by their weighted degree. 


set, but only between a very small subset of the points. The spectrum of the covariance matrix in particular 
the ratio r of its two eigenvalues given by (3.1), guides us in building locally at each point, a sparse ellipsoid-like 
neighborhood graph. 

For each observable state we build a local adjacency graph, denoted by G,, in the shape of an 

ellipse pointing in the direction of the fast dynamics, and whose small axis points in the direction of the slow 
dynamics. Figure 3.3 shows an example of such a local 1-hop neighborhood graph Gi, where the central node 
(a;i,a; 2 )*'b is connected to all points contained within the boundaries of an appropriately scaled ellipse centered 
at (xi, a; 2 )*'d. Finally, we define the sparse graph G of size N x N associated to the entire network as the union of 
all locally defined ellipsoid-like neighborhood graphs G = Ui=i ^'z. Note that the union graph G is still a simple 
graph, with no self edges and no multiple edges connecting the same pair of nodes. We compute the distance 
ds by (3.3) (and thus the similarity W,,) between a pair of nodes (xi,a; 2 )b) and (xi,X 2 )^^^ if and only if the 
corresponding edge (i,j) exists in G. 

We plot in Figures 3.1(c) and 3.2(c) the histogram of the weighted degrees of the nodes in the weighted 
graph W defined in (3.5), while Figures 3.1(d) and 3.2(d) show a scatterplot of the states of the system, where 
each state i is colored by its weighted degree, i.e., the sum of all its outgoing weighted edges Wij,j = 1 ,2,..., n. 
Throughout the computational examples in this paper, the smoothing parameter e which appears in (3.5) was 
set to e = 0.1. In contrast to the approach in [39] which computes all O(iV^) pairwise similarities, the sparsity 
of G (and thus of the associated graph Laplacian L) in our approach only requires the computation of a much 
smaller number of distances, as low as linear, depending on the discretization of the domain, and makes it 
computationally feasible to solve problems with thousands or even tens of thousands of nodes. 

4. A robust mapping from the observable space O to the “dynamically meaningful” inaccessible 
space "H. As a first step towards partitioning the nodes of the original graph G and detecting the associated 
slow variable, we sort the entries of the top eigenvector $i, which we then denote by with $i(l) > $i(2) > 

• This sorting process defines permutation a of the original index set i = 1,2,...,A so that 

$i((T(i)) = $i(z). We consider the increments between two consecutive (sorted) values 

Sz = $i(j) - $i(j -f 1), 
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i = 1,2,..., A - 1. 


(4.1) 




















Fig. 3.3. The local neighborhood graph Gi at a given node the shape is an ellipsoid whose axis ratio is given by the 

ratio of the eigenvalues r of the local covariance matrix d‘^{{xi,X 2 )^'^\{xi,X 2 )^^^), i.e. by (3.1). The corresponding eigenvectors 
are used to calculate the orientation of the ellipse. 



Fig. 4.1. Illustrative example CS-I. (a) Jump sizes (4.1) of the sorted eigenvector of the sparse anisotropic graph Laplacian 
L. (b) The correlation of with the ground truth slow variable s = {xi -\-X2)l2. (c) Zoom-in on the sorted top eigenvector 
(the colors denote the corresponding slow variable) showing that is almost piece-wise constant on the bins that correspond to 
distinct slow variable states. The kernel scale is set to e = 0.1. 


Next, we sort the vector of such increments, denote its entries by and show in Figure 

4.1(a) (resp. Figure 4.2(a)) the top 300 (resp. top 420) largest such increments Si for illustrative example 
CS-I (resp. CS-II). Note that this already give us an idea about the number of distinct slow states in the 
system, a set which we denote by S. Ideally, the difference <i)i(i) — d>i(j) in the entries of the top eigenvector 
corresponding to two observable states {xi,X 2 )^'‘^ and {xi,X 2 )^^'^ that belong to the same slow variable s (i.e., 
xf + 2xf = xf + 2x^f = s for illustrative example CS-II) should be zero or close to zero, in which case 
we expect that only approximately |5| of the — 1 increments Si are significantly larger than zero, while the 
remaining majority are zero or close to zero. 

In Figures 4.1(b) and 4.2(b) we highlight the correlation between the entries of the top non-trivial eigenvector 
$1 and the corresponding slow variable S. In Figures 4.1(c) and 4.2(c), we zoom on a subset of states to make the 
point that the eigenvector <I>i is almost constant on the O-states that correspond to the same value of the slow 
variable. The plots in Figures 3.1(b) and 3.2(b) show a coloring of the networks generated by the two chemical 
systems CS-I and CS-II, based on the first nontrivial eigenvector of the associated sparse Laplacian L. Note that 
the eigenvector looks almost piecewise constant along the lines that point to the evolution of the fast variable, 
for a given value of the slow variable (S = {Xi + X2)/2 for CS-I and S' = -|- 2 X 2 for CS-II), yet nowhere 

along the way we have input this information into the method. In the next step we use this top eigenvector 
to identify all nodes of the graph (original states of the chemical system) that correspond to the same value of 
the underlying slow variable. In other words, all nodes whose corresponding eigenvector entries are between an 
appropriately chosen interval (that we shall refer to a bin) will be labeled as belonging to the same slow variable 
S. In other words, we seek a partition of the observable states in O, i.e., of the nodes of G, such that all original 
states (xi,a: 2 )^*^ with the same value of the corresponding slow variable s((a;i, X 2 )*-*^) end up in the same bin. 
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(a) 


(b) 


(c) 



Eigenvector entry 


n=12100; Spearman Correlation=0.99999 




5000 5200 5400 5600 5800 6000 

Index 


Fig. 4.2. Illustrative example CS-II. (a) Jump sizes (4.1) of the sorted eigenvector of the sparse anisotropic graph Laplacian 
L. (b) The correlation o/^i with the ground truth slow variable s = xi -\-2x2. (c) Zoom-in on the sorted top eigenvector (the 
colors denote the corresponding slow variable) showing that is almost piece-wise constant on the bins that correspond to distinct 
slow variable states. The kernel scale is set to e = 0.1. 


Our goal is to find a partition V = {T^i, 7^2, ■ • ■, "Pfe} of O such that 


k 

Vj = e O \ s{xi,X 2 ) = qj} and [J'Pj=0, (4.2) 

i=i 

where k denotes the number of distinct values qj, j = 1, 2,..., fc, of the slow variable S. As an example, in the 
case of CS-I given by (2.2), the partition Vj = {(1,99), (2, 98),..., (99,1)} corresponds to all nodes in the graph 
for which the value of the associated slow variable is constant qj = 50. The key observation we exploit here is 
that the top eigenvector of the Laplacian matrix is almost piecewise constant on the bins that partition O, since 
the nodes of G that correspond to the same value of the slow variable have a very high pairwise similarity, with 
Wij very close to 1 . 

One may also interpret the above problem as a clustering problem, where the similarity between pairs 
of points is given by (3.5), and is such that nodes that belong to the same bin have a much higher similarity 
compared to nodes that belong to two different bins, an effect due to the strong separation of scales. In the case of 
illustrative example CS-I, the clusters correspond to lines in the two-dimensional plane such that {xi -\-X 2 ) 1‘i = c, 
for a constant c. We point out the interested reader to the work of [32], where the top eigenvectors of the 
random walk Laplacian are used for clustering. While in practice one uses the top several eigenvectors as the 
reduced eigenspace where clustering is subsequently performed, in our case the top eigenvector alone suffices to 
capture the many different clusters (i.e., bins), a fact we attribute to the strong separation of scales exhibited 
by the illustrative chemical systems CS-I and CS-II. If several eigenvectors were considered then one could use a 
clustering algorithm, such as k-means or spectral clustering [38, 45], to obtain the partitioning (4.2). However, 
a simpler method has been successfully used for the 1 -dimensional eigenspace in both examples we considered. 
It is described as follows. Recall the sorted vector of increments > ^2 > • ■ • > ^Af-i defined in equation (4.1), 
and consider the set of the k — \ largest such increments {i5i, < 52 , ■. •, 5k-i\ where > <52 > ■ • • > Next, 

from the sorted eigenvector $1 we extract the position of the entries whose associated increment (with respect 
to its right-next neighbor index) belongs to {di, 82 , ■ ■ ■ ^ 5fc_i}. In others words, we compute 

bt = arg l>i(f) - l>i(*-I-1) = 5t, where t = 1,2,..., fc - 1, (4.3) 

and 60 = 0 and bk = N. Finally, we compute an estimated partition V oi O hy Vq = {i G O \ a{i) G {bq-i,bq]}, 
where g = 1, 2,..., fc, and a is the permutation of the original index set i = 1,2,..., N, given in the definition of 
# 1 , i.e. $i(cr(i)) = $i(*). 

To illustrate the correctness of our proposed technique, we compute the Jaccard index between each proposed 
partition set Vj, j = 1 , 2 ,... ,k and each ground truth partition set Vi, i = 1, 2,..., |5|: 

Jij = i ’ ’^here i = 1,2,..., |5|, j = l,2,...,fc, (4.4) 

I U Pj\ 

and show a heatmap of this matrix in Figure 4.3(b). Since we are interested not only in the partition, but also in 
recovering the ordering of the slow variable, we show in Figure 4.3(c) the correlation between the ground truth 
ordering of the slow variable and our recovered ordering. Note that we can only recover the ordering up to a 


9 









Fig. 4.3. Illustrative example CS-I. (a) The eigenvector-based slow variable cardinality. The Theta score © is the smoothness 
measure of the bin cardinalities, defined in (4.5). The algorithm perfectly recovers the ground truth partition, (b) The heatmap of 
the pairwise Jaccard similarity matrix given by (4.4). (c) The correlation between the ordering of the ground truth slow variable and 
the eigenvector recovered slow variable, (d) The Jaccard index of the pairwise matched bins (from the maximum matching). 


(a) 


(b) 


Cardinality true slow var. 151 = 328, 0 = 108 


Cardinality eig slow var. 151 = 328, 0 = 7206 



(c) 


Cardinality eig slow var. 151 = 328,0 = 7206 



Slow variable (zoom-in) 


Fig. 4.4. Illustrative example CS-II. (a) The ground truth slow variable cardinality, (b) The eigenvector-based slow variable 
cardinality. 0 captures the smoothness of the bin cardinalities, as introduced in (4.5). (c) Plot of the cardinalities of a subset of 
bins, showing the erroneous bin assignments in the eigenvector-based partition. This is a zoomed-in version of panel (b). 


global sign, since —is also an eigenvector of L. Finally, we compute the maximum weight matching (using, for 
example, the Hungarian method [29]) in the bipartite graph with node set 7^ U "P and edges across the two sets 
given by matrix J in (4.4). In Figure 4.3(d) we plot the Jaccard index of the matched partitions. For the first 
chemical system CS-I, note that the algorithm perfectly recovers the ground truth partition. In Figure 4.4 we 
present the outcome of the binning algorithm for the illustrative example CS-II, which is no longer satisfactorily 
by itself and requires further improvement. Though the bin cardinalities in the initial solution visually resemble 
the ground truth, there are numerous mistakes being made. To illustrate this, for a given partition V, we compute 
the following measure of continuity of the recovered bin cardinalities 

s-i 

0P = 0(Pi, ...,Vs) = ^(inl - |P.+i|)'. (4.5) 

i=l 

In other words, 0-p captures the squared difference in the cardinalities of two consecutive bins. For the chemical 
system CS-II, the ground truth yields a score 0 = 108, while for the eigenvector-recovered solution 0 = 7206, 
thus indicating already that numerous misclassifications are being made, without even computing the Jaccard 
similarity matrix (4.4) between the two partitions. To this end, we introduce in the next subsection a heuristic 
denoising technique followed by a truncation of the domain, which altogether lead to a better partitioning of O 
into groups of states that correspond to the same slow variable. 

4.1. A bin denoising scheme. While the eigenvector-based partition procedure detailed above yields 
accurate results for the CS-I in (2.2), this procedure alone is not sufficient for obtaining a satisfactory partition 
for the more complex CS-II considered in (2.7), as illustrated by the high corresponding 0-score (0 = 7206) 
shown in Figure 4.4(b). In Figure 4.4(c) we zoom into some of the recovered bins, showing that the eigenvector- 
based reconstruction splits some of the inner bins, which explains the high associated 0-score. In other words, 
states/bins which in the ground truth solution correspond to the same values of the slow state variable, are 
divided into two adjacent bins, and mistaken for two distinct states of the slow variable. 

To solve this issue, we propose a bin-denoising heuristic that robustly assigns data points to their respective 
bins. In hindsight, the continuity of the eigenfunctions of the Laplacian should be reflected in the continuity 
of the histogram of state counts in bins corresponding to adjacent intervals. We detail in Algorithm 1 an 
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Algorithm 1 Bin-merging algorithm: 


1 

Initialize FLAG = TRUE 


2 

while FLAG is TRUE do 


3 

Compute a^:=e{Vi,...,ViU V^+i,... Vis\) , 

Vi = 1,2,... 5 — 1 using definition (4.5) 

4 

if min ai < Q{Vi,... ,V\s\) then 

2=1,2,...,|»S| —1 


5 

q = argmin ai 





6 

V := Vl, ... ,VqU Vq+l, .. .'P\s\ 


7 

\S\ = \S\-1 


8 

else 


9 

FLAG = FALSE 


10 

end if 


11 

end while 




(d) 

Max matching in the Jaccard Index matrix 
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Our recovered slow variables (1:314) 


Fig. 4.5. Illustrative example CS-II. (a) The eigenvector-based slow variable cardinality after truncating and bin denoising. 
The Theta score © is the smoothness measure of the bin cardinalities, defined in (4.5). (b) The heatmap of the pairwise Jaccard 

similarity matrix given by (4.4). (c) The correlation between the ordering of the ground truth slow variable and the eigenvector 
recovered slow variable, (d) The Jaccard index of the pairwise matched bins (from the maximum matehing). 


iterative heuristic procedure which, at each step, merges two adjacent bins such that the resulting 0-score is 
minimized across all possible pairs of adjacent bins that can be merged. We show in Figure 4.5(a) the resulting 
bin cardinalities after the bin-merging heuristic and after truncating at the boundary of the slow variable. Note 
that the new denoised partition yields 0 = 103, and the number of bins (states of the slow variable) decreases 
from |iS| = 328 to |iS| = 314. Furthermore, in Figure 4.5(b) we compute the Jaccard similarity matrix between 
the ground truth and the newly obtained partition, showing in Figure 4.5(d) that we almost perfectly recover 
the structure of the ground truth bins. 

5. A Markov approach for computing the steady distribution of the slow variable. In this section, 
we focus on the final step of the ADM-CLE approach, of estimating the stationary distribution of the slow variable, 
without any prior knowledge of what the slow variable actually is. One of the ingredients needed along the way 
is an estimation of the conditional distribution P(F'|S' = s) of the fast variable F given a value s of the slow 
variable S, which we compute via two approaches. As the first approach, we consider the Conditional Stochastic 
Simulation Algorithm (CSSA) [8] which is given in Algorithm 2. It samples from the distribution of the fast 
variable conditioned on the slow variable. The second approach is entirely analytic and free of any stochastic 
simulations, and amounts to analytically solving the CME for each set in the partition V = {Vi,'P 2 , ■ • ■, ’Pfe}. We 
then compare our results to the Constrained Multiscale Algorithm (CMA) introduced in [8] , which approximates 
the effective dynamics of the slow variable as a SDE, after estimating the effective drift and diffusion using the 
CSSA (Algorithm 2). 

5.1. A stochastic simulation algorithm for estimating the conditional probability (CSSA). Our 

next task is to estimate the conditional distribution P(E|S' = s) of the fast variable F given a value s of the slow 
variable S. One possible approach for doing this relies on the CSSA algorithm to globally integrate the effective 
dynamics of the slow variable. One iteration of the CSSA is given in Algorithm 2. Ideally, one repeats steps 
1-6 of Algorithm 2 and samples values of F until the distribution P(F|S' = s) converges. In practice, we run 
Algorithm 2 until Lc changes of the slow variable S occur. This computation is done for each value in the range 
of the slow variable S = {si, S 2 ,..., S|s|}- 

5.2. An analytical derivation of the conditional distribution. An alternative approach which we fol¬ 
low in this paper relies on an analytical computation of the conditional distribution P(F'|S' = s), thus eliminating 
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Algorithm 2 One iteration of the CSSA for computing the conditional distribution P(F|iS' = s) of the fast 
variable F given a value s of the slow variable S 

m 

1: Compute the propensity functions ai{t), for i = 1,2,... ,m, and their sum ao(^) = E 

2: Generate ri and r 2 , two uniformly distributed random numbers in (0,1). 

3: Compute the next reaction time as t + r where r = — log(ri)/ao(t). 

4: Use r 2 to select reaction Rj which occurs at time t + r 

(each reaction Ri, i = 1,2,... ,m occurs with probability ai/ao). 

5: If the slow variable S changes its current state from s to s' 7 ^ s due to reaction Rj occurring, reset S' = s to 
its previous value, while not changing the value of the fast variable F. 

6: If any of the variables Xi goes outside the boundary of the considered domain, then revert to the state of 
the system in Step 4 before reaction Rj occurred. 


vAi 

^2 

s 


^2 

S' 

Rj 

aj / ao 

aj 

1 

3 

7 

2 

3 

8 

1 

7.8 X 10"® 

96 

1 

3 

7 

0 

3 

6 

2 

7.8 X 10"^ 

0.96 

1 

3 

7 

2 

3 

8 

3 

1.5 X 10-2 

184.38 

1 

3 

7 

0 

3 

6 

4 

1.6 X 10-3 

19.75 

1 

3 

7 

3 

2 

7 

6 

9.7 X 10-1 

12000 

3 

2 

7 

4 

2 

8 

1 

7.3 X 10-3 

64 

3 

2 

7 

2 

2 

6 

2 

2.0 X 10-4 

1.92 

3 

2 

7 

4 

2 

8 

3 

2.1 X 10-2 

184.38 

3 

2 

7 

2 

2 

6 

4 

6.7 X 10-3 

59.25 

3 

2 

7 

1 

3 

7 

5 

5.4 X 10-2 

480 

3 

2 

7 

5 

1 

7 

6 

9.1 X 10-1 

8000 

5 

1 

7 

6 

1 

8 

1 

5.4 X 10-3 

32 

5 

1 

7 

4 

1 

6 

2 

3.0 X 10-4 

1.6 

5 

1 

7 

6 

1 

8 

3 

3.1 X 10-2 

184.38 

5 

1 

7 

4 

1 

6 

4 

1.6 X 10-2 

98.75 

5 

1 

7 

3 

2 

7 

5 

2.7 X 10-4 

1600 

5 

1 

7 

7 

0 

7 

6 

6.7 X 10-4 

4000 

7 

0 

7 

8 

0 

8 

3 

5.0 X 10-2 

184.38 

7 

0 

7 

6 

0 

6 

4 

3.7 X 10-2 

138.25 

7 

0 

7 

5 

1 

7 

5 

9.1 X 10-4 

3360 


Table 5.1 


Illustrative example CS-II. The set of all ground states of the system {xi,X 2 ) corresponding to the slow variable S = X 1 + 2 X 2 = 
7. We denote by the states reachable from (xi,X 2 ) in one transition step, and by S' the associated corresponding slow 

variable such that S' = X[ + 2 X 2 . Rj denotes the reaction channel that takes the chemical system from state (xi,X 2 ) to (x^Xj), 
with corresponding propensity ctj. We highlight in bold letters the subset of all states via which the system can transition in one 
step from the slow variable S = 7 to S = 8. 


the need for any expensive stochastic simulations. We illustrate in Figure 5.1 the transition diagrams for the two 
chemical systems we consider in this paper. For chemical system CS-II, the system can transition from a given 
state (xi, X 2 ) to four adjacent distinct O-states: to {xi — 2 ,X 2 + 1) via channel R 5 , to {xi + 2 ,X 2 — 1) via channel 
Rq, to {xi — 1,2:2) via channels R2 and i?4, and finally to state (xi + 1,2:2) via channels i?i and R3. However, 
in terms of the underlying slow variable S, the system can transition to only two adjacent states S = s — 1 (via 
channels R2 and R4) and S' = s -I- 1 (via channels Ri and R3), or remain at the current state S = s, via channels 
i?5 and Rq. Considering the subsystem of fast reactions of CS-II and conditioning on the line s = 2:1 -I- 22:2, the 
stationary CME takes the form 

0 = ^5(2:1 -|- 2 )( 2 :i -|- 1 ) P(Ai = 2:1 -|- 2 , X2 = X2 — 1 ) 4 “ kg,(x 2 4 - 1) P(Ai = xi — 2 , X2 = X2 4 - 1 ) 

- (fc52:i(2;i - 1) 4 - fc62;2)P(Ai = 2:1, A2 = 2:2). 

Thus, the conditional distribution for CS-II is, for 0 < 2:1 < s, given by 

P(f = x,|S = ,) = —=y!(yV77)757(t;j ■ a(.-x.) is eve,, number. 
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Fig. 5.1. Transition diagrams for the two chemical systems: (a) CS-I; and (b) CS-II. 
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Fig. 5.2. The Markov chain on the slow variable state space, using the aggregated transition probabilities (5.1)-(5.2) for the 
chemical system CS-II. 


Here, C is the normalization constants and P(F' = xil^ = s) = 0 if (s — xi) is an odd number. 

A similar argument can be used for CS-L The stationary CME of the fast subsystem of CS-I is written as 


0 — (^1 T 1) P(A^i — Xi -h 1, A 2 — X 2 — 1) -h ^3 {x 2 1) P(A^i — Xi — 1, X 2 — X 2 1) 

- ik 2 Xi + k^X2) P(2Ci = Xi,x = X2)- 


where s = [xi + a; 2 )/ 2 . Thus, the conditional distribution for CS-I is, for 0 < xi < 2s, given by 


F{F = xi|S' = s) 


Xi\x2\ yksj 


c 

xi!(2s — xi)\ 



where C is again the normalization constant. 

5.3. Aggregated transition rates and a Markov Chain on the state of slow variables. In the final 
step of the ADM-CLE approach, we set up a Markov chain on the state space of slow variables with the end 
goal of estimating the stationary distribution of the slow variable. As illustrated in Figure 5.1(b), the system 
CS-II can can transition from a given state S' = s to two adjacent states S = s — 1 (via reaction channels i ?2 
and R 4 ) and S = s -I- 1 (via channels Ri and S 3 ), or it can remain at the current state S = s, via channels S 5 
and Rq. Consider now the set Vs = {x^) = [xi,X 2 Y'^^\xi -\- 2 x 2 = s}, illustrated as the middle bin in Figure 5.2. 
To compute the transition rate between two adjacent bins Vs and Vs+i, one has to aggregate over possible ways 

is) 

of getting from an observable state in bin Vs to an observable state in bin Vs+i- We compute 0) to be the 
aggregated transition rate from state Vs to state Vs+i^ over all possible states (xbl^x^^)), such that xb) g Rg 
and x^-J) G Vs+i, by 


m 

x(Og-P, x(j)gT',+i fc=l 


(5.1) 


where Q(xb),x('^^, Sfe) denotes the indicator functions of whether one can transition from the O-state xb) to 
O-state xb) via reaction Rk- We define similarly the aggregated transition rate 0|, that the chemical system 
transitions from the slow variable state Vs to Vs-i by 

m 

® 2 = E E = = ( 5 - 2 ) 

x(Og7:>^ x(j)gP»_i k=l 


We illustrate in Figure 5.3 the aggregated transition rates between the slow state S = s and its adjacent states 
S' = s — 1 and S = s -I- 1, for all values of the slow variable S. Note that in the derivations (5.1) and (5.2), we 
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Fig. 5.3. Plot of the aggregated transition 


rates for the illustrative example CS-II: (a) ©i; (b) © 2 ; and (c) ©1 — © 2 . 


(a) 


(b) 


Error = 0.028239 


Error = 0.0033892 



Fig. 5.4. The final estimated stationary distribution of the slow variable S for the ADM-CLE approach, computed without 
knowledge of the slow variable, for (a) the chemical system CS-I; and (b) the chemical system CS-II. (blue histograms). Red solid 
lines are exact solutions computed by solving the CME of the full model and using the corresponding definition of the slow variable. 


can either rely on the CSSA algorithm to sample from the conditional distribution of fast variables given values 
for the slow variables, as shown in Section 5.1, or use the analytic formulation which is possible to derived for 
both CS-I and CS-II, see Section 5.2. 

Finally, we compute the solution to the stationary CME associated to the system in Figure 5.2 which can be 
written as 


0 = 0 ® V(s - 1 ) + 02'''^7r(s + 1) - (01 + 02 ) 


(5.3) 


where 7r(s) « P(5' = s) is the probability that S' = s at time t. Assuming that 7r(s) = 0 for all s ^ S and 
using no-flux boundary conditions, we arrive at a linear system. The eigenvector of the resulting matrix, with 
associated eigenvalue A = 0, yields an approximate solution of the stationary CME, which we plot in blue in 
Figure 5.4. Our result is visually indistinguishable from the exact solution (plotted as the red solid line). 

5.4. A comparison with the Constrained Multiscale Algorithm (CMA). We compare the approach 
we introduced in the previous Section 5.3 with the CMA method proposed in [8]. We compare the results of the 
two methods with the ground truth, and record the error defined as 


Error |^7r,P(S' = s)^ = |7’'(s) — P(S = s) , 

s£S 


(5.4) 


where P(S = s) denotes the ground truth probability distribution of the slow, and tt denotes the estimated 
solution, either by the CMA and or the ADM-CLE. As Table 5.2 shows, the ADM-CLE approach yields lower 
errors compared to the CMA algorithm, even when we run the latter with the parameter Lc as large as 20,000. 
Note that for the chemical system CS-I, the ground truth probability distribution of the slow variable P(S = s) 
can be easily computed using the multivariate Poisson distribution, as discussed in Section 2.1. For the second 
chemical system CS-II, we consider as ground truth the solution obtained by solving the associated CME of the 
full model in a large (truncated) domain. 
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Lc = 100 

Lc = 500 

Lc = 2,000 

Lc = 5,000 

Lc = 10,000 

Lc = 20,000 

ADM-CLE 

CS-I 

0.21768 

0.089228 

0.10723 

0.045092 

0.040542 

0.066988 

0.003389 

CS-II 

0.66641 

0.49063 

0.14206 

0.070711 

0.12937 

0.10995 

0.028239 


Table 5.2 


The distance (as measured by the error in (5.4)J between the estimated and the ground truth probability distributions of the 
slow variable, for the CM A algorithm which runs the CSS A algorithm for each value of the slow variable S = s, until Lc changes 
of the slow variable occur. The rightmost column shows the recovery errors for our proposed Markov-based approach. 


Error = 0.40385 


Error = 0.055016 


Error = 0.053468 
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(c) L = 10, 000 
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(e) L = 100 


(f) L = 1,000 


(g) L = 10, 000 


(h) L = 100,000 


Fig. 5.5. (a)-(d) The stationary distribution of the slow variable computed by the CMA for CS-I, using knowledge of the slow 

variable (blue histograms). The red solid line is the exact solution, P(S = s), obtained by solving the CME of the full system CS-I. 
The CMA approach runs the CSSA algorithm for each value of the slow variable S = s until Lc = {10^, 10^, 10^, 10®} changes of 
the slow variable occur. Panels (e)-(f) show the CMA-computed distribution after smoothing out by the Kernel Density Estimation 
procedure. 


In Table 5.2 we show numerical results that highlight the accuracy improvement of the ADM-CLE approach 
compared to the CMA approach of [8]. For the latter method, we run the CSSA algorithm [8] for each value of 
the slow variable, until Lc changes of the slow variable occur. As expected, the accuracy of the CMA algorithm 
improves as Lc increases, at the cost of additional computational running time of the method. In comparison, 
our stochastic simulation free approach yields significantly more accurate results, with errors that are at least 
one order of magnitude lower than the CMA method with L = 20, 000. We plot in Figures 5.5 and 5.6 (top 
rows) the estimated stationary distribution of the CMA method for both chemical systems considered throughout 
this paper, for several values of the L parameter. The bottom rows of the same Figures 5.5 and 5.6 show the 
estimated distribution, after smoothing out by the Kernel Density Estimation (KDE). 

6. Summary and discussion. In this paper we have introduced an ADM-CLE approach for detecting 
intrinsic slow variables in high-dimensional dynamic data, generated by stochastic dynamical systems. In the 
original ADM framework, the local bursts of simulations initiated at each data point to estimate the local 
covariances are computationally expensive, a shortcoming we avoid by using an approximation of the CLE. A 
second innovation that further improved the computational performance relates to the underlying similarity 
graph, a starting point for the diffusion map approach. By exploiting the spectrum of each local covariance 
matrix, we built a sparse ellipsoid-like neighborhood graph at each point in the data set, with the end result of 
being able to build a sparse similarity graph that requires the computation of a much smaller number of distances, 
which makes the ADM-CLE approach scalable to networks with thousands or even tens of thousands of nodes. 
For the two illustrative examples considered in this paper, the size of the resulting graphs is A^ = 10, 201 for 
CS-I, and N = 12,100 for CS-II, respectively. Had these graphs been complete graphs, the number of resulting 
weighted edges would be over 50 million, while in our computations, the number of edges is approximately 2.9 
million for CS-I, and 3.9 million for CS-II. 

We have proposed a spectral-based method for inferring the slow variable present within the chemical sys- 
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Fig. 5.6. (a)—(d) The stationary distribution of the slow variable computed by the CMA for CS-II, using knowledge of the slow 

variable (blue histograms). The red solid line is the exact solution, P(S = s), obtained by solving the CME of the full system CS-II. 
The CMA approach runs the CSSA algorithm for each value of the slow variable S = s until Lc = {10^, 10^, 10^, 10®} changes of 
the slow variable occur. Panels (e)-(f) show the CMA-computed distribution after smoothing out by the Kernel Density Estimation 
procedure. 


tern without any prior knowledge of its structure, and a Markov-based approach for estimating its stationary 
distribution. We augment the proposed algorithmic approach with numerical simulations that confirm that the 
ADM-CLE approach can compare favorably for some systems to the CMA for estimating the stationary distri¬ 
bution of such slow variables. The ADM-CLE approach can also be applied to systems with a low number of 
states of slow variables. The CMA, as introduced in [8], is more suitable for systems where the slow variable(s) 
can take many different values, because the CMA uses an underlying SDE approximation for the behaviour of 
the slow variables. One option to overcome this problem is to estimate effective propensity functions of the slow 
subsystem [9]. An open question is to extend the ADM-CLE to systems where the range of the Xi variables is 
very large. In the ADM-CLE approach applied to CS-I or CS-II, we associate a state (i.e., node in the initial 
graph) to each possible combination of pairs of states (xi,X 2 ), an approach no longer feasible whenever the range 
of the variables is large. To bypass this problem, one could change the discretization of the state space and 
modify accordingly the Markov chain based approach (Figure 5.2) used in the ADM-CLE. 

The ADM-CLE couples a method for finding slow variables (ADM) with an approach to compute the station¬ 
ary distribution of a multiscale chemical reaction network. Chemical systems depend on a number of parameters 
(e.g. kinetic rate constants) and an open question is to extend the ADM approach to situations where one (or 
more) parameters are varied, i.e. to perform bifurcation analysis of multiscale stochastic chemical systems [30]. 

Finally, we point out recent work of Dsilva et al. [11], who also rely on the ADM framework to discover 
nonlinear intrinsic variables in high-dimensional data in the context of multiscale simulations, with the task 
of merging different simulations ensembles or partial observations of such ensembles in a globally consistent 
manner. Their work is motivated by the fact that often one is not merely interested in extracting the hidden 
(slow) variables from the underlying low-dimensional manifold, given partial observations i = 1,2 ,..., as 
in the ADM setting, but also in extending high-dimensional functions on a set of points lying in a low-dimensional 
manifold. Their proposed approach relies on the so called Laplacian Pyramids [35], a multiscale algorithm for 
extending a high-dimensional function defined on a set of points in the space of intrinsic variables to a second 
set of points not in the data set, by using Laplacian kernels of decreasing bandwidths (the e parameter in (3.5)). 
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